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ABSTRACT 


Floating  Point  (FP)  multiplication  is  widely  used  in  large  set  of  scientific  and  signal  processing  computation. 
Multiplication  is  one  of  the  common  arithmetic  operations  in  these  computations.  A high  speed  floating  point 
double  precision  multiplier  is  implemented  on  a Virtex-6  FPGA.  In  addition , the  proposed  design  is  compliant 
with  IEEE-754  format  and  handles  over  flow , under  flow , rounding  and  various  exception  conditions.  The 
design  achieved  the  operating  frequency  of  414.714  MHz  with  an  area  of  648  slices. 
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I.  Introduction 

The  real  numbers  represented  in  binary  format 
are  known  as  floating  point  numbers.  Based  on 
TEEE-754  standard,  floating  point  formats  are 
classified  into  binary  and  decimal  interchange 
formats.  Floating  point  multipliers  are  very 
important  in  DSP  applications. 

This  paper  focuses  on  double  precision 
normalized  binary  interchange  format.  Figure  I 
shows  the  TEEE-754  double  precision  binary 
format  representation.  Sign  (S)  is  represented  with 
one  bit,  exponent  (E)  and  fraction  (M  or  Mantissa) 
are  represented  with  eleven  and  fifty  two  bits 
respectively.  For  a number  is  said  to  be  a 
normalized  number,  it  must  consist  of  one'  in  the 
MSB  of  the  significant  and  exponent  is  greater  than 
zero  and  smaller  than  1023.  The  real  number  is 
represented  by  equations  (I)  & (2). 


\ 


Figurel.  TEEE-754  double  precision  floating  point  format 


Value=  -is  x M x 2E 

Floating  point  implementation  on  FPGAs  has  been 
the  interest  of  many  researchers.  In  [I],  an 
TEEE-754  single  precision  pipelined  floating  point 
multiplier  is  implemented  on  multiple  FPGAs  (4 
Actel  AI280).  Nabeel  Shirazi,  Walters,  and  Peter 
Athanas  implemented  custom  16/18  bit  three 
stage  pipelined  floating  point  multiplier,  that 
doesn't  support  rounding  modes  [2].  L.Louca, 
T.A.Cook,  W.H.  Johnson  [3]  implemented  a single 
precision  floating  point  multiplier  by  using  a 
digit-serial  multiplier  and  Altera  FLEX  8000.  The 
design  achieved  2.3  MFlops  and  doesn't  support 
rounding  modes.  In  [4],  a parameterizable  floating 
point  multiplier  is  implemented  using  five  stages 
pipeline,  Handel-C  software  and  Xilinx  XCYIOOO 
FPGA.The  design  achieved  the  operating  frequency 
of  28MFlops.  The  floating  point  unit  [5]  is 
implemented  using  the  primitives  of  Xilinx  Yirtex  IT 
FPGA.  The  design  achieved  the  operating  frequency 
of  100  MHz  with  a latency  of  4 clock  cycles. 
Mohamed  AI-Ashraf}',  Ashraf  Salem,  and  Wagdy 
Anis  [6]  implemented  an  efficient  TEEE-754  single 
precision  floating  point  multiplier  and  targeted  for 
Xilinx  Yirtex-5  FPGA.  The  multiplier  handles  the 
overflow  and  underflow  cases  but  rounding  is  not 
implemented.  The  design  achieves  30  I MFLOPs 
with  latency  of  three  clock  cycles.  The  multiplier 
was  verified  against  Xilinx  floating  point  multiplier 
core. 
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II.  Floating  Point  Multiplication  Algorithm 

Multiplying  two  numbers  in  floating  point  format 
is  done  by 

1.  Adding  the  exponent  of  the  two  numbers  then 
subtracting  the  bias  from  their  result. 

2.  Multiplying  the  significant  of  the  two  numbers 

3.  Calculating  the  sign  by  XORing  the  sign  of  the 
two  numbers. 

In  order  to  represent  the  multiplication  result  as 
a normalized  number  there  should  be  I in  the  MSB 
of  the  result  (leading  one). 

The  following  steps  are  necessary  to  multiply 
two  floating  point  numbers. 


Figure  3.  Multiplier  structure  with  rounding  and  exceptions 


The  following  steps  are  necessary  to  multiply 
two  floating  point  numbers. 

1 .  Multiplying  the  significant  i.e.  (I. MI  * I. M2) 

2.  Placing  the  decimal  point  in  the  result 

3.  Adding  the  exponents  i.e.  (E  I + E2  - Bias) 

4.  Obtaining  the  sign  i.e.  si  xor  s2 

5.  Normalizing  the  result  i.e.  obtaining  I at 
the  MSB  of  the  results  "significand" 

6.  Rounding  the  result  to  fit  in  the  available  bits 

7.  Checking  for  underflow/ overflow  occurrence 

III.  Implementation  of  Double  Precision 
Floating  Point  Multiplier 


In  this  paper  we  implemented  a double  precision 
floating  point  multiplier  with  exceptions  and 
rounding.  Figure  2 shows  the  multiplier  structure 
that  includes  exponents  addition,  significand 
multiplication,  and  sign  calculation.  Figure  3 
shows  the  multiplier,  exceptions  and  rounding  that 
are  independent  and  are  done  in  parallel. 


A_exponent  B exponent  Amantissa  B_maifesa 


A.  Multiplier 

The  black  box  view  of  the  double  precision 
floating  point  multiplier  is  shown  in  figure  4. The 
Multiplier  receives  two  64-bit  floating  point 
numbers.  First  these  numbers  are  unpacked  by 
separating  the  numbers  into  sign,  exponent,  and 
mantissa  bits.  The  sign  logic  is  a simple  XOR.  The 
exponents  of  the  two  numbers  are  added  and  then 
subtracted  with  a bias  number  i.e.,  1023.  Mantissa 
multiplier  block  performs  multiplication  operation. 
After  this  the  output  of  mantissa  division  is 
normalized,  i.e.,  if  the  MSB  of  the  result  obtained  is 
not  I,  then  it  is  left  shifted  to  make  the  MSB  I.  If 
changes  are  made  by  shifting  then  corresponding 
changes  has  to  be  made  in  exponent  also. 

The  multiplication  operation  is  performed  in  the 
module  (fJ:lU_mul).  The  mantissa  of  operand  A 
and  the  leading  T (for  normalized  numbers)  are 
stored  in  the  53-bit  register  (mul_a).  The  mantissa 
of  operand  Band  the  leading'  I'  (for  normalized 
numbers)  are  stored  in  the  53 -bit  register  (mul_b). 
Multiplying  all  53  bits  of  mul_a  by  53  bits  of  mul_b 
would  result  in  a 106-bit  product.  53  bit  by  53  bit 
multipliers  are  not  available  in  the  most  popular 
Xilinx  and  Altera  FPGAs,  so  the  multiply  would  be 
broken  down  into  smaller  multiplies  and  the 
results  would  be  added  together  to  give  the  final 
106-bit  product.  The  module  (fJ:lU_mul)  breaks  up 
the  multiply  into  smaller  24-bit  by  17-bit 
multiplies.  The  Xilinx  Virtex-6  device  contains 
DSP48E  I slices  with  25  by  18  twos  complement 
multipliers,  which  can  perform  a 24-bit  by  17-bit 
unsigned  multiply. 

The  breakdown  of  the  multiply  in  module 
(fJ:lU_mul)  is  broken  up  as  follows 
product_a  = mul_a[23:0]  * mul_b[16:0] 
product_b=  mul_a[23:0]  * mul_b[33:17] 


54  International  Journal  for  Modern  Trends  in  Science  and  Technology 


IJMTST 


Volume:  2 | Issue:  09  | September  2016  | ISSN:  2455-3778 


product_c=  mul_a[23:0]  * mul_b[50:34] 
product_d  =mul_a[23:0]  * mutb[52:51] 
product_e=  mul_a[40:24]  * mul_b[16:0] 
productj=  mul_a[40:24]  * mutb[33:17] 
product_g=  mul_a[40:24]  * mul_b[52:34] 
product_h  = mul_a[52:41]  * mul_b[16:0] 
product_i  =mul_a[52:41]  * mul_b[33:17] 
productj  =mul_a[52:41]  * mul_b[52:34] 


fp_ 

double_mult 

opa(63:0) 

r ^ ^ 

out(63  0) 

opb(63:0) 

exception 

rmode(  1 :0) 

inexact 

elk 

invalid 
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overflow 

rst 

ready 

underflow 

fp_ 

_double_mult 

Figure  4.  Black  box  view  of  floating  point  double  precision 
multiplier 

The  products  (a-j)  are  added  together,  with  the 
appropriate  offsets  based  on  which  part  of  the 
mul_a  and  mul_b  arrays  they  are  multiplying, 
fields  of  operands  A and  B are  added  together  and 
then  the  value  (1023)  is  subtracted  from  the  sum  of 
A and  B.  If  the  resultant  exponent  is  less  than  0, 
then  the  (product)  register  needs  to  be  right  shifted 
by  the  amount.  This  value  is  stored  in  register 
(exponent_under).  The  final  exponent  of  the  output 
operand  will  be  0 in  this  case,  and  the  result  will  be 
a denormalized  number.  If  exponent_under  is 
greater  than  52,  then  the  mantissa  will  be  shifted 
out  of  the  product  register,  and  the  output  will  be 
0,  and  the  "underflow"  signal  will  be  asserted.  The 
mantissa  output  from  the  (fJ:lU_mul)  module  is  in 
56-bit  register  (product_7).  The  MSB  is  a leading  'O' 
to  allow  for  a potential  overflow  in  the  rounding 
module.  The  first  bit  'O'  is  followed  by  the  leading  T 
for  normalized  numbers,  or  'O'  for  denormalized 
numbers.  Then  the  52  bits  of  the  mantissa  follow. 
Two  extra  bits  follow  the  mantissa,  and  are  used  for 
rounding  purposes.  The  first  extra  bit  is  taken  from 
the  next  bit  after  the  mantissa  in  the  106-bit 
product  result  of  the  multiply.  The  second  extra  bit 
is  an  OR  of  the  52  LSB's  of  the  106-bit  product 


B.  Rounding  and  Exceptions 

The  IEEE  standard  specifies  four  rounding  modes 
round  to  nearest,  round  to  zero,  round  to  positive 
infinity,  and  round  to  negative  infinity.  Table  1 
shows  the  rounding  modes  selected  for  various  bit 
combinations  of  rmode.  Based  on  the  rounding 
changes  to  the  mantissa  corresponding  changes 
has  to  be  made  in  the  exponent  part  also. 

Table!:  Rounding  modes  selected  for  various  bit 
combinations  of  rmode 


Bit  combination 

Rounding  Mode 

00 

round-nearest-even 

01 

round  to  zero 

10 

round_up 

\i 

round  down 

In  the  exceptions  module,  all  of  the  special 
cases  are  checked  for,  and  if  they  are  found,  the 
appropriate  output  is  created,  and  the  individual 
output  signals  of  underflow,  overflow,  inexact, 
exception,  and  invalid  will  be  asserted  if  the 
conditions  for  each  case  exist. 

IV.  Results 

The  double  precision  floating  point  multiplier 
design  was  simulated  in  Modelsim  6.6c  and 
synthesized  using  Xilinx  ISE  12.2i  which  was 
mapped  on  to  Virtex-6  FPGA.  The  simulation 
results  of  64-bit  floating  point  double  precision 
multiplier  are  shown  in  figure  5.  The  'opa'  and  'opb' 
are  the  inputs  and  'out'  is  the  output.  Table  2 
shows  the  device  utilization  for  implementing  the 
circuit  on  Virtex-6  FPGA.  Table  3 shows  the  timing 
summary  of  double  precision  floating  point 
multiplier.  Table  4 shows  the  area  and  operating 
frequency  of  double  preCISIOn  floating  point 
multiplier,  Single  precision  floating  point  multiplier 
[6]  and  Xilinx  core  respectively.  M.AI-AshrafY, 
A.Salem  and  W.Anis  [6]  implemented  single 
precision  floating  point  multiplier  and  it  occupies 
an  area  of  604  slices  and  it's  operating  frequency  is 
301.114  MHz.  Where  as  in  case  of  Xilinx  core,  it 
occupies  an  area  of  266  slices  and  it's  operating 
frequency  is  221.484  MHz.  So  the  implemented 
design  provides  high  operating  frequency  with 
more  accuracy. 
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Table2:  Device  utilization  summary  (Virtex  -6vlx75ttl484-3) 
of  double  precision  floating  point  multiplier 


Logic  Utilization 

Used 

Number  of  slice  registers 
(Flip-Flops) 

1,998 

Number  of  slice  LUTs 

2,181 

Number  of  occupied  slices 

648 

Number  of  bonded  lOBs 

203 

Table  3:  Timing  summary  of  double  precision  floating  point 
multiplier 


Parameter 

Valne 

Minimum  period  (ns) 

2A11 

Maximum  Frequency  (MHz) 

414.714 

Figure  5.  Simulation  results  of  double  precision  floating  point  multiplier 


Table  4:  Area  and  operating  frequency  of  double  precision 
floating  point  multiplier,  single  precision  floating  point 
multiplier  [6]  and  Xilinx  core 


Present 

Work 

M.AI-Ashrafy,  A. Salem  and 

Xilinx  Core 

W.Anis  16J 

Device 

paramete 

rs 

Double 

Precision 

Single  precision 

Single 

Precision 

No.  of 
slices 

648 

604 

266 

V.  Conclusion 

The  double  precision  floating  point  multiplier 
supports  the  LEEE-754  binary  interchange  format, 
targeted  on  a Xilinx  Virtex-6  xc6vlx75t-3ff484 
FPGA.  The  design  achieved  the  operating 
frequency  of  414.714  MFLOPs  with  area  of  648 
slices.  The  implemented  design  is  verified  with 


single  precision  floating  point  multiplier  [6]  and 
Xilinx  core,  it  provides  high  speed  and  supports 
double  precision,  which  gives  more  accuracy 
compared  to  single  precession.  This  design 
handles  the  overflow,  underflow,  and  truncation 
rounding  mode 
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